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Abstract 

We consider the evolution of large but finite populations on arbitrary 
fitness landscapes. We describe the evolutionary process by a Markov, 
Moran process. We show that to 0(1/N), the time-averaged fitness is 
lower for the finite population than it is for the infinite population. We also 
show that fluctuations in the number of individuals for a given genotype 
can be proportional to a power of the inverse of the mutation rate. Finally, 
we show that the probability for the system to take a given path through 
the fitness landscape can be non-monotonic in system size. 

1 Introduction 

Natural populations are characterized by finite sizes. For this reason, it is im- 
possible for biology to sample the entire space of all possible genotypes. Even 
the number of possible sequences with high fitness is typically much larger than 
the population size in naturally occurring populations. Effects due to finite pop- 
ulation size are particularly pronounced in asexual populations. For example, 
the reduction of fitness in a finite population without back mutation is termed 
Muller's ratchet [Tj, and the decreased speed of evolution in a finite population 
without recombination is termed the the Hill-Robertson effect [3J. 

The relative influence of different evolutionary forces changes between small 
and large populations. While stochastic effects such as genetic drift act more 
strongly on small populations, natural selection acts more effectively on large 
populations. Many results in classical population genetics have focused on the 
limiting cases of small or infinite populations. In sufficiently small populations, 
beneficial mutations occur but rarely survive long enough to become established 
in the population. Those mutations that survive, however, can spread through 
a small population, reaching fixation, before another beneficial mutation arises. 
This regime is referred to as successional-mutations regime [31 2j and is fairly 
well-understood. This theory has been useful, for example, to understand evo- 
lution of transcription factor binding sites [5] . As the population size increases, 
beneficial mutations arise more frequently. Fixation of individual mutations 



does not occur before the arrival of another beneficial mutation. In asexual 
populations this leads to competition between descendants of each of the mu- 
tations — an effect referred to as clonal interference [6]. As the population 
becomes even larger, ultimately stochastic effects become negligible, and the 
time-evolution of the evolving population can be described by a set of ordinary 
differential equations. This regime has been studied extensively in quasispecies 
theory, albeit often only for simple fitness functions. 

Here we investigate the regime between clonal interference and quasispecies 
theory. We seek to predict the evolutionary dynamics followed by a large yet 
finite population and how this dynamics differs from that of an infinite popu- 
lation. The study of finite-population effects requires a stochastic description 
based on a master equation [7] . We make no assumption about the fitness land- 
scape upon which the population evolves. We show that, averaged over time, 
the average fitness of a large finite population is lower than that of a population 
of infinite size. In other words, for large asexual populations evolving on a fixed 
fitness landscape, an increase in population size is accompanied by an increase 
in the average fitness. Furthermore, small mutation rates lead to high fluctu- 
ations and correlations. In particular, for small mutation rates, fluctuations 
and correlations in the number of individuals for a given genotype are inversely 
proportional to a power of the mutation rate. These large correlations enhance 
finite population effects and make the convergence to infinite-population behav- 
ior occur only for extremely large populations. 

This article is organized as follows. We describe the stochastic process un- 
derlying our studies in section [2j We explain how this dynamic process can be 
written as a field theory. We derive analytic results for the infinite population 
evolution from this field theory. We describe finite population effects in section 
[3j We introduce the fitness landscape that we use to illustrate our results in 
section [4j In section [5] we investigate fluctuations in this random process. We 
verify our analytic results using stochastic simulations in section[5] We conclude 
in section [6l 

2 Stochastic Process Mapped to a Field Theory 

Throughout this article, we use the Moran process to model evolution of a pop- 
ulation [8]. The individuals in the population are identified by their genotype, a 
sequence of length I. In this continuous-time process a constant population size, 
N, is maintained by simultaneous replication and death. The individual to be 
replicated is chosen randomly from the population with probability proportional 
to its microscopic fitness, while the individual to be killed is chosen randomly 
from the population with uniform probability. We further assume that repli- 
cation and mutation are independent. Thus, there are two classes of events: 
mutation and replication. Mutation from genotype i to genotype j occurs at 
a rate of [lA^Ni, where \i is the mutation rate per locus, Ni is the number of 
individuals with genotype i, and A,j is equal to one if an individual can mu- 
tate from sequence i to sequence j with a single mutation and A.y is equal to 
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zero otherwise. This description allows for the incorporation of back-mutations 
which are often ignored in the literature. Note that the analytical results in this 
paper do not depend on this binary form of the matrix A. Its elements can be 
arbitrary non-negative numbers as would be appropriate if back-mutation rates 
differed from forward mutation rates. Replication of genotype i and simultane- 
ous death of genotype j occurs at a rate of -^nNiNj, where r-j is the replication 
rate of sequence i. The stochastic master equation for this process is 

^P(AT; Ay [(7V 4 + 1) P(N + e t - e,;t) - N t P{N; t)} 

^E r *E " ^ W + X ) P ( JV - e < + e i?*) - WjPW *)] • 

(1) 



Here TV is a vector describing the state of the population by the number of 
individuals of each genoptype: (N 1: N 2 , . . .), and is a unit vector associated 
with genotype i. Note TVj = N. 

We obtain analytic expressions for the average occupation numbers and the 
fluctuations by mapping the stochastic process described in the previous section 
onto a field theory following 9 . To do this we introduce the state vector 

|^))=^P(iV;t)|iV) (2) 

AT 

whose time evolution is governed by 



N 



i J2 A « [W + !) + e * - e i5 *) - NiP{N; t)} 



+^ E r > E ^ !) + !) ^ - e * + e **) - N i N i p (N; t)] \N) 

(3) 

By defining annihilation and creation operators 

Oi\N) =N i \N-e i ), a\\N) = \N + ei ) a^t - fit a, = (4) 
we can write the governing equation for the state vector as 

®- t m)) = -H\m), (5) 



where 



-h = V E Ai J ( a ] ~ a <) ^ + J? E r «' a l ( a i ~ a l) ■ ( 6 ) 
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This differential equation has the formal solution 

\m) = e- 6t \m) > (7) 

where \ip(0)) = \N ) is the initial distribution of individuals in the population. 
At time T, the average of an observable represented by the (normal-ordered) 
operator F({al,cii}) can be obtained |10j by multiplying with the "sum bra" 

(•IHoiOLe 3 -) 

(F) T = (.| F{{&\M) MT)) = ("I F({at,ai})e- 6T \N°) . (8) 

We introduce a Trotter factorization for the evolution operator e~ HT , using a 
time interval e — > 0, in the basis of coherent states defined by a, \z) = Zi \z) and 
obtain a path integral representation 

(.| Fd&la^e-^ \N°) = <-| F({a\, a^" ■ e-* e"«* \N°) 

[Dz*Dz]F({z(T/e)})e- s( - z - z *\ (9) 



Here, the action in the exponent is, after the change of variables z* = 1 + z. 
S(z,z) = J2 



T/e T/e 

zi(k) Zi {k) - £ zi( k >i( k - 1) - W) ln (1 + ^(°)) 

fc=0 k=l 



T/e 

~ ^ e EE - Mb)) z i( k - 

k=l i,j 
T/e 

k— 1 i,j 

The population dynamics in the limit as the population size, N, becomes 
infinite emerges as a saddle point in the action [5]. Setting SS/Szi(t)\ c = leads 
to zf(t) = 0. From setting 8S/5zi(t)\ = we obtain zf (t) = Npi(t) where Pi(t) 
obeys the differential equation 

"J: /'E : _ &ijPi) + r *Pi - ( r )P*- ( n ) 

3 

Here (r) = r^-pj is the average fitness of the infinite population. This differ- 
ential equation has the closed-form solution [TT] 

(e Yt ).. P j{0) 

"M- W <12> 

where the matrix Y is defined by = /uAjj — /Lttfy Ajfc + 5yrj. 
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3 Finite Population Shift to Probability Distri- 
bution 



We proceed to quantify analytically how finite population effects alter the in- 
finite population dynamics. To do so we expand the action about the saddle 
point and separate it into a Gaussian and a non-Gaussian part. Introducing 
Zi(fc) = z ci (k) + Szi(k) and Zi(k) = Szi(k) in Eq. 10 we can write S = So + AS, 
where the reference action So can be written as 



s = l -x T -n^-x 



where 



Here, 
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The matrices A and B are 



in; 



<B)i 




(A) i:i = //A., - fiSa A 

\ m 

\ Tfl 



o )k-i,k~ \ Sij +e(A)J. 



+ j^r l z Cl (k - 1) + 



r mZ cm (k 1) ) jy- r j Z ci(k 1)) 



(13) 



z T = ({^(0), <5z(0)}, {**(!), **(!)}, • • • , {^(T/e), **(T/e)}) . (14) 



(15) 



(16) 



(17) 



and 



(B)ij = 2SijriZ ci (k - 1) - — (r< + r,-) z ci (fc - l)z Cj (fc - 1). 



(18) 
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The non-Gaussian part of the action is given by 



ln(l + <5^(0)) - Sz.iO) + -(Sz^O)) 2 



e 

N 



T/e 

J2Y,^ (^( fc ) - 6z ^ k - V Sz i( k - V (19) 

k=l i,j 

+ r i Sz i (k) (5zi{k) — Szj(k)) z ci {k — l)6zj(k — 1) 
+ nSzi(k) (Szi(k) - Szj(k)) 8zi(k - l)z cj (k - 1) 
+riSzi(k) (Szi(k) - Szj(k))6zi(k - l)Szj(k - 1)] . 

This formulation allows us to calculate averages using the Gaussian action 
and thermodynamic perturbation theory, which is equivalent to a cumulant 
expansion. The average occupation numbers are given by 



cue. 



-HT 



N a ) = (-\a l 



-HT 



| TV ) 



-AS e -So _ 



[Dz*Dz]z. l {T/e)e~ s ^^ 
(20) 

= { Zl (T/e)e- AS ) (21) 



= J [Dz*Dz} Zl (T/e)e 
= ( Zl (T/e)) - (z z (T/e)AS) + \ (z 4 (T/e)(AS) 2 ) + 
= Npi(T) - (5z t (T/e)AS) + \ (5 Zi {T/e){ASf) Q + ■ 



(22) 
(23) 



where the last step follows from ((A5) n ) = Vn G Z,n > 1. This procedure 
leads to an asymptotic expansion for the occupation numbers in powers of 1/N. 
To first order, we obtain 



± { N a )(T)~ Pa (T) + ± 



dtJ2no z a !(T,t)n i*(t,t) (r 



/)• (24) 



This expansion about infinite size is accurate when the correction term on the 
right hand side of Eq. (24 1 is much smaller thanp a (T). Equation (36) provides 
an estimate of the magnitude of the correction for a common landscape with k 
intermediate steps. The second order term is given by Eq. |A.l| in the appendix. 
We derive expressions for the matrices n ^(T, t) and n ^ z (i,t) by inverting 
IV 1 in Eq.~ 
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In continuous time for T > t, they obey 

sn -(T,i) 



dT 



= A(T)U zz (T,t), 



(25) 



with 



n "(t,t) 



(26) 



6 



and 



JTT ZZ I, ,\ 

" ' ' =B(t) + A(t)U zz (t,t) + n zz (t,t)A T (t), (27) 



dt 



with 



n o f/(0,O) = -%JV i (0). (28) 



Using the expression for the first-order shift to the occupation numbers due 
to finite population effects, we calculate the finite population shift in the average 
fitness of the population. The average fitness correction is 

{5r(T)) = W 2 f dtY^ra^fdT.m^lt^ir^r,) (29) 

[ dtJ2 r a ^olt(T, t) (no«(t, t) + NS ijPi (t)) rj , (30) 
Jo 
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This result shows that the correction to the mean fitness is 0(1/N) the mean 
fitness in the limit of infinite population. This result can be rewritten in a more 
revealing form. Let f(t) be a random variable defined as 

K*)=~I>(^(*)- W))) (31) 

i 

in the limit of large population size. The finite population correction to the 
average fitness can then be written as 

(Sr(T))=- f (f(T)f(t))dt (32) 
Jo 

and its time integral as 



o Jo 



(M/))«// = - / ill I <lt' (f(tW)) = ~\ ( ( / rilult] ). Ctt, 




This expression for the average fitness correction, which resembles a fluctuation 
dissipation theorem, implies that the time-average of the finite-population shift 
is always negative. In other words, the average fitness of a large finite popula- 
tion is smaller than that of a population of infinite size. Note that this result 
is perturbative, valid for large population size N, and it does not require the 
average fitness to be a monotonic function of iV for small N. On complex fit- 
ness landscapes, it is possible for small asexual populations to achieve a higher 
average fitness than larger ones jT^j. Nonetheless, for sufficiently large pop- 
ulation sizes, the time-integrated average fitness increases monotonously with 
population size. 
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Figure 1: Left-hand side: the state-space for a fitness landscape with three 
forward-mutations and no back-mutations. Each node, i, is a particular geno- 
type. The replication rate of each genotype is r». Right-hand side (discussed 
in Section [6| : The state-space can be expanded to include mutational histories. 
Each two- mutation state is split into 2! =2 states while the three- mutation 
state is split into 3! = 6 states. The node is now identified by a vector which 
conveys the mutational history of a particular path through the landscape. 

4 The Landscape 

The analytical expressions developed in this paper are applicable to arbitrary 
fitness landscapes and mutational pathways. However, we now describe in some 
detail the implications for fitness landscapes [13] defined by a certain number 
of fitness loci I with two alleles each. Genotypes that differ from each other 
by exactly one point mutation in one of the loci are connected in the mutation 
matrix. Each position in sequence space is thus connected by a mutation event 
to I other genotypes. Figure [T] shows the geometry of the landscape for the case 
of three loci. Typically in this landscape, the fitness of each state increases upon 
moving to the right in the figure. 

5 Fluctuations around the Mean 

The matrices Ho zz (t,t) and Uo zz (T, t) can be understood intuitively. In the 
limit of large N, the off-diagonal elements of Ho zz (t,t) describe the covariances 
between the occupation numbers at time t while the diagonal elements are re- 
lated to the variances of the occupation numbers at time t by 

^(6Na(t)f ~ i (p a (t) + in r„(*,t)) • (34) 

At different times, Hq zz (T, t) gives the cross-covariances between the occupation 
numbers at times T and t. The matrix IlQ^"(T,t) relates the correlations at 
different times to the same-time correlations via 

n Q zz (T,t) = u zS (T,t)n zz (t,t). (35) 
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We observe numerically that for small mutation rates, the fluctuations are pro- 
portional to a negative power of the mutation rate. Specifically, 

where k is the number of mutational steps as shown in Fig. [2] This dependence 
can also be shown analytically for sufficiently simple landscapes. See section [B] 
in the appendix for one example. Thus the expansion, which naively appears 
to be in 1/N is actually in l/(N/j, k ). Thus, the expansion breaks down when 
H < l/TVVfc. The expansion is valid for large N and ^> 1/N 1 ' . 




Figure 2: The maximal change of the variance with time (+), i.e. 



max tji dTlo*i(t, t)/dt where Hq zz is obtained from Eqs. [27] and 28, depends on 
the mutation rate as an inverse power law. Shown are calculations for a non- 
epistatic version of the landscape as described in section [4] with a) two possible 
mutations — ro = 0, An ~ 0.049, Ar 2 ~ 0.010, b) three possible mutations — 
r = 0, An « 0.049, Ar 2 0.010, Ar 3 m 0.002 — and c) four possible mutations 
- r = 0, An w 0.049, Ar 2 w 0.020, Ar 3 w 0.006, Ar 4 « 0.002. In this case, 
the fitness of each state is simply the sum of contributions from each mutation. 
The solid lines indicate power law fits using the values for /i < 10~ 5 . Their 
exponents are a) -1.999, b) -2.989, and c) -3.939. The exponent is observed to 
be equal to the number of mutational steps in the landscape. 

We verify our analytical results by performing stochastic simulations using 



the Lebowitz/Gillespie algorithm [14l[T5]. Rewriting Eq. 24 for the first order 
shifts to the occupation numbers, 

(JV ) (T) - N Pa (T) ~jjfdtJ2 n *f (T, t)no£(t, t) ( n - rj ) , (37) 

we observe that the finite population correction converges to a constant value 
for large N. The average replication rate in the population is linear in the occu- 
pation numbers. It is equal to -k rjiVj(t). Therefore, the average replication 
rate also converges to the quasispccics result in the limit of a large population. 
That is, the average replication rate is equal to that of the infinite population 
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plus a correction that is of order l/N smaller. Figure [3] shows this conver- 
gence for one set of parameters. As a further check on our analytic results, we 
fit a cubic polynomial in l/N to the simulation data displayed in Fig. [3j For 
the particular fitness parameters chosen here, the coefficients from this fit are 
320.4±2.5 for the constant term and (— 5.3±0.8) x 10 5 for the linear term, while 
our theory predicts 319.0 and —5.2 x 10 5 , respectively. Here, the coefficient of 
the linear term is obtained from Eq. |A.1| in Appendix [X] Similarly, we observe 
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Figure 3: (a) Finite-population correction to the average occupation numbers 
(left-hand side of Eq. 37 ) as a function of population size, N, on a three-mutation 



landscape as shown in Fig. [T] including back-mutations. Shown are data for a 
mutation rate of \i = 10~ 5 and replication rates of 7'o = 0, r% « 0.049, r2 ~ 
0.010, r 3 w 0.002, r 4 « 0.059, r 5 « 0.051, r 6 w 0.012, and r 7 « 0.061. The time 
is chosen as T — 157.5 which approximately maximizes (iVo) (T) — Npo(T). 
As TV increases, the corrections obtained from stochastic simulations — A^ (x), 
Ni(0), N 2 (+), N 3 (*), N 4 {D), N 5 (0), N 6 (y), N 7 (A) — converge to the values 
predicted by the theory (solid lines). The dashed curves show the second order 
expansion, given by Eqs. [37] and |XT] The error bars are one standard error, 
(b) Finite-size correction to the mean population fitness. The average replica- 
tion rate in the population is linear in the occupation numbers, being equal to 
W Si r i^i{t)i an d so it too converges to the quasispecies result in the limit of 
a large population. 

that the variances obtained from stochastic simulations agree with the analytic 
expression given in Eq. [34] as shown in Fig. [4] 



6 Discussion and Conclusion 

Although the theory described in this paper was developed to study the time- 
evolution of the occupation numbers in sequence space, we can immediately 
apply these results to investigate which mutational paths individuals take. This 
allows us to predict the large N behavior of the probability that a population will 
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Figure 4: Variances divided by population size, N as a function of N. The 
values obtained from stochastic simulations — Nq(k), Ni(Q), A^-t-), N 3 (*), 
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iV 4 (D), iV 5 «», N 6 (v), ^VV(A) — agree with the values predicted by Eq ^ 
(solid lines). The time and other parameters are the same as in Fig. [3] The 
error bars are one standard error. 



follow a certain mutational trajectory. To do this we simply expand the state 
space describing the identity of each individual to include not only the possible 
sequences but also the mutational histories. Figure [T] illustrates this expansion 
for the case of three mutations. Figure [5] compares the probability of following 
a given path as obtained from stochastic simulations to the expressions given in 



Eqs. 24 and |A.1| We again observe that the simulation results converge to the 



values predicted by the theory as the population size increases. Interestingly, 
we observe numerically that the probability for a population to take a certain 
mutational path varies with the population size in a non-monotonic fashion. In 
particular, there is an intermediate population size at which the population is 
most likely to take the dominant path through the landscape. 

Fluctuations due to finite population can be quite large. As shown in Ap- 
pendix[B] these fluctuations are proportional to an inverse power of the mutation 
rate. That is, the expansion in 1/N has a coefficient that depends on a power 
of the inverse of the mutation rate. For this reason, convergence to the infinite 
population limit can be exceedingly slow. The coefficient in the expansion in 
1/N also has a time dependence. As shown in Appendix [Cj this coefficient can 
be proportional to t, and so diverge at long times. This divergence occurs when 
there are multiple final states, with equal replication rates. For example, the 
fluctuations diverge at long times in the expanded state space due to what may 
be termed fixation of path probabilities. 

In this paper we presented a path-integral formulation of evolution under a 
Moran-type process on arbitrary fitness landscapes. We derived analytic results 
that describe the dynamics exactly in the limit of an infinite population size 
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10° 10 1 10 2 10 3 10 4 10 5 
Population Size, N 

Figure 5: Probability that a population will follow a certain mutational tra- 
jectory as a function of population size. Shown are data for the landscape in 
Fig. [l] excluding back-mutations with a mutation rate of ^ = and epistatic 
replication rates of r = Q,n « 0.049, r 2 w 0.010, r 3 w 0.002, r 4 w 0.012, r 5 « 
0.051, re ~ 0.059, and ry w 0.061. Equation [37| (solid lines) predicts the asymp- 
totic behavior of the simulation values — iVi23(x), iV2i3(0)i -^132(0), 
•^23i(*), N 3 2i(§) — for large population sizes. The second order expansion 
(dashed lines) improves the prediction for sufficiently large populations. The 
error bars are one standard error. 
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and obtained an asymptotic expansion in the inverse of the population size 
for finite populations. We showed that the finite population correction to the 
time-averaged fitness is always negative, which implies that for sufficiently large 
population sizes the time-averaged fitness increases with population size. We 
also found that for small mutation rates, the infinite-population variances of the 
occupation numbers behave as [i~ k where k is the number of mutational steps 
from the ancestral sequence. Finally, we showed how the formalism described in 
this paper can also be used to investigate which mutational path a population 
takes through the fitness landscape by expanding the sequence space to include 
mutational histories. 
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A Second Order Correction 



Equation [24] gives the terms up O(N ) of an asymptotic expansion for the aver- 
age occupation numbers in powers of 1 / N. We here determine the second order, 



0(N x ) correction terms. Figure A.l shows all possible vertices appearing in 



the diagrams. Unlike the first correction term, which is derived from only the 





(a) 



(b) 



(c) 



(d) 



(e) 



Figure A.l: Vertices for the diagrammatic expansion. A white circle represents 
an open time, while black circles stand for times that are integrated over. 



single non- vanishing diagram shown in Fig. A. 2 the second order correction 




Figure A. 2: Diagram for the O(N ) correction to the average occupation num- 
bers. 



term comes from the nine different diagrams shown in Fig. |A.3| We obtain 



N 



(N a ) (T)~p 



a(T) + jpJ^ dt J2 n °af (T, t)Vc%(t, t) (r, -rj) + j; (N a ) {2) (T), 



(A.l) 
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(a) Multiplicity: 2 




(b) Multiplicity: 1 (c) Multiplicity: 2 




(d) Multiplicity: 4 (e) Multiplicity: 2 




(f) Multiplicity: 8 (g) Multiplicity: 4 (h) Multiplicity: 8 




(i) Multiplicity: 12 



Figure A. 3: Diagrams for the 0(N 1 ) correction to the average occupation 
numbers with their multiplicities. 
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where 

(N a ) (2) (T) 



= 2^p£ dt T, ( n °atm t) n :f (T, t)) {n - r 3 ) 

dt' ]T n> (UoW (*> - n of (*, t')) Uo% (t, t')u ^, (f, f ) 
±_ jf # £ (no*f(r, t) - n ^(T, t)) ( rj - r,) 



/' 



/Vj;(IIofi!(t 1 f)-no?j5(i ) f))(r i '-r j .) f dt" ]T (r^-r^,) 
./o ^ Jo i „ >J .„ 

[(no?f„ (t, t") - n *J„ (t, *")) (nog, (t 7 , * / )n ^ i » (*", t") 

+2n -„(t',t")n -„(t',i")) 
+ 2 (n ? 2 „ (f, t") n ^„ (f, *")) (n ^, (t, i')n ^» (f, O 

+2no^(t,i")n ^„(i / ,i // ))] 



+ 4 / rfi E ( n °« ( T < *) - U ^(T, t)) (n rj) (A.2) 

f dt' Y J {^W{t,t , )-^,{t,t')){r i , -r r ) f dt" 
Jo ., ., Jo 



rj// 



[n *f„ (t, t") (n ^„ (*', t") - n ^.„ (*', *")) (n ^» (*', t")z cj „ (t") + n g,, (*', t")z C4 „ (*")) 
+ n £„(t / ,t") (no^(t,t") - n *f„(M")) (n ^(t , ,t ,, )^(*") + n ^(t',O*cXO) 
+ n ^,(t',t") (n *4„ (*',*") - n ?,*,/ (*',*")) (n ^,(M'> CJ "(O + n £„(t, (*"))] 

^ [ T dtJ2 (no'f (T, t) - Hog (T, i)) (r< - r,-) 

Y> /* dt" V n ? f V(*".* ,, )(ri»-r J -») 
Jo i , > . ( Jo .„_,.„ 

n S5(t,f) (n *f,(*,i') -n *J,(M')) (n ^,(t',t")z c ,/(t') + n ^V(* / ,O^(* / )) 



+ 4 / dt E ( n °« ( T < *) - n °«f ( T > *)) - r i) I dt ' E n °£^< - ^' 

]T n o#- (*, o)n ^ it', o)n ^„ (*', o)ni« (o) 
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B Fluctuations Proportional to a Negative Power 
of the Mutation Rate 

In this appendix we consider a special case of the model described in section [2j 
for which we show analytically that for small mutation rates, /i, the variance in 
the infinite population occupation numbers is proportional to N/j,~ k , where k is 
the number of mutational steps in the landscape. We work in the limit that N —> 



oo. We seek to understand when the 1/N expansions of Eqs. ( 24 1 and ( 34 1 break 



down. We will show that for small /z, the naive e xpa nsion ml/N is actually an 
expansion m l/(N(i k ). The expansions in Eqs. (I24J) and ( |34| ), therefore, break 
down when fi < l/N 1 ^. In other words, the expansion is valid for large ./V 
and [i \j~N x l k . Let there be k + 1 positions in sequence space linked by k 
mutations which occur at equal rate /i such that Ajj = 5i,j-i for i < k, where 
5i t j is the Kronecker delta. The fitness increases in the direction of mutations 
(all mutations are beneficial) but the fitness increments decrease monotonically. 
This landscape is commonly encountered when there is a dominant path through 
a landscape. For example, we encountered this case when applying our theory 
to long-term experimental studies of bacterial evolution |16j . Fig. B.l shows a 



N, 



fl 



ro 



i \i N 2 [i 
*r~2 



3 /J, 



r?, 



Tk-l fk 



Figure B.l: A simple landscape in which mutations occur at rate /i, without 
back mutation, the replication rate at position i is r^, and iVj is the occupation 
number at position i. 

graphical representation of this landscape. We assume that the mutation rate 
is very small, fi <C r, and that there is no back mutation. Initially, the entire 
population is in the starting state, Ni(t = 0) = NSi.o. For this simple landscape 
Eq. [TTJcan be solved explicitly for the infinite population occupation numbers. 
In the limit as /i — > 0, we have 



where 



i i rbt 



i=0 b=0 



6=0 



(B.l) 
(B.2) 



i-iy 



6-1 



76- 



j=b+l ]=0 





b < i 



b> i. 



(B.3) 
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Substitution into Eq. (11 1 confirms these solutions in the /i — > limit. 

Let Cij = lim^v^oo ((NiNj) — (Ni) (Ni)) /N denote the infinite-population 
covariance matrix. From section [5] we know that 

C ij (t) = 6 ijPi (t) + ±n %(t,t) (B.4) 

In the limit of infinite N, the correlation matrix C converges to a number 
independent of N. We can show that 

dC ^ = B(t)+A(t)C(t) + C(t)A T (t) ) (B.5) 

with 

Ctf(0)=0 (B.6) 

and 

Bij(t) = - (fj,AijPi(t) + fiAjiPjit) + (n + rj)pi(t)pj(t)) 

^AaiPai^ + fl^A^Pi^ + npi^ + i^Piit)) . (B.7) 
a a / 

To compute B, one is allowed to use the infinite N values for Pi(t) because 
finite N corrections to Pi(t) lead to higher order terms in the expansion Eq. 




(|34|). Let t = 0, t a ee In(Ar a //i) /Ar a ,0 < a < k. We examine Eq. ( |B.2[ ). 
We consider t > t a . Expression |B.2| for p a will be dominated by the last term 
in the series, since the ratio of the magnitude of the last term to the second 
to last term is exp(Ar Q t) Y[^Zq~ {r a -i ~ r j)/( r a ~ r j) = (Ar a //j) exp[Ar Q (t - 
Ilj=o ( r a-i~ r j)/( r a~ r j)> an d this is large for small fi and t > t a . Further- 
more, the ratio oip a top a _i is (/i/Ar a ) exp(Ar a i) Y[ J jZo' ( r a-i-fj)/(r a -rj) = 

exp[Ar a (t-t a )} H J jZo~ (r a -i-rj)/(r a -rj), which is also large for t > t a . The 
time interval from t a to t a+ i gets larger as /i gets smaller, so that the time 
period during which p a -i and p a are of similar magnitude, t ~ t a , becomes less 
and less significant. Figure [B~2] shows this result numerically. Finally, the ratio 
of p a+1 to p a is exp[Ar a+1 (t-t a+ i)] T[jZo~ ( r a-rj)/(r a+ i-rj), which is small 
for t < t a+ \. Thus, for small /i, in the time interval t a to i a +i, most of the 
population is in state a. That is, 

Pa{t)~>p a '{t) a ^ a,t a <t <* 0+ i,/i-» 0. (B.8) 



Using this result and keeping the lowest order in /x in Eq. 17 we find 

Aij(t) ~ {r,j - r a ) (Si :j - S l:a ) , t a < t < t a+1 ,/j, -> (B.9) 

such that 

~ fl y (t) + (n + r, - 2r a ) Cy(t) - ^ (r„ - r„) (*,-, C i)n + S^C^) . 

n 

(B.10) 



18 




Figure B.2: Infinite population occupation numbers versus time for k = 4, 
r = 0, r\ = 1.00, ?*2 = 1.45, r 3 = 1.65, r 4 = 1.74, and three different values for 
\i: (a) 10~ 5 , (b) 10~ 8 , and (c) 10~ n . The occupation numbers, po (solid), p\ 
(dotted), p 2 (dash-dotted), p 3 (dashed), p 4 (solid with circles), are calculated 
using Eq. [12] Note that as fi becomes smaller, p a becomes more and more 
dominant during the time interval t a < t < t a+ \. 



For this landscape, Eq. |B.7| reduces to 

Bij(t) = - lp5 it j-ipi(t) + (i6 it j +1 pj(t) + (n + rj)pi(t)pj(t)] 

+ Sij (m-i(t) +n(l- Si,k)Pi(t) + npi(t) + (r) Pi (t)) (B.ll) 

and, in particular, 

B kk (t) = -2r k (pk(i)) 2 + fxp k -i{t) + TkPk{t) + (r) p k (t). (B.12) 



Substituting Eqs. |B.1| and |B.2| into this expression and keeping only the lowest 
power of fi, we obtain, for t < t\, 



k 



S fcfe (t<t 1 )~ rfe /x fe ^7^e r « t m^O (B.13) 



a=0 

and thus 

k 

dC ^ 1 ~ £ 7q V«< + 2r k C kk (t), t < t 1 ,fi -+ 0. (B.14) 

a=0 

Integrating and only keeping terms to lowest order in \x yields 

(* \ 2rk/Ari k k 
^ (R15) 

For later time periods, the evolution of C kk (t\ < t < t k ) is dominated by the 
second term in Eq. |B.10| as fj, — > 0: 

dCk jf t} ~2{rk-r a )C kk (t), t a <t<t a+1 ,0<a<k,fi^0 (B.16) 
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with solution 



2=1 v ' 7 o'=0 ' 

t a <t < t a+ i,0 < a < k, fi -> 0. (B.17) 

Fig. |B.3| shows the convergence of this approximation to Eq. [34] as // — > for 
one set of replication rates. 




Figure B.3: Infinite population variance of the final state vs. time for k = 4, 
r = 0, r*i = 1.00, r 2 = 1.45, r 3 = 1.65, r 4 = 1.74, and three different values for 
fi: (a) 10~ 3 , (b) 10 -5 , and (c) 10~ 8 . Exact values calculated using Eq. 34 (solid 



lines) and the approximation given in Eq. B.17 (dashed lines) are both shown. 
Note that C k k(tk) oc ^T k . 



Using Eq. B.17| we find that as fi — > 



la 



2-r. 



n 



a=0 



.7 = 1 



(B.18) 



The maximum of Ckk{t) occurs near t k - This result follows from Eq. (B.10 ). The 



first term on the righthand side of Eq. (B.10) only matters during < t < t% 



After that, B k k has a larger power of \i then Ckk does. The second term on the 
righthand side is zero for t > t k - Thus, for t > t k , only the third term on the 
righthand side matters, and it is negative. Thus, for t > t k , C kk (t) decreases. 
It is for this reason that the dashed curves in Fig. |B.3| are shown for < t < t k 
only. 
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C Fluctuations in the Expanded State Space at 
Large Times 

Consider the expanded state space of a landscape as shown in Fig. [I] generalized 
to an arbitrary number of loci. For any finite population size N, the only 
sinks are the final states in which all mutations have occurred in some order, 
all of which have the same replication rate. Thus, after a certain time tf, the 
occupation numbers at positions prior to the final states can be neglected so 
that the dynamics can be described by Eq.[l]with a single replication rate r and 
without mutation, 

^P(AT;t) = L £ [( Ni - 1) (Ni + l)P(N - ei + e f ,t) - NiN^N; t)] . 

From this we obtain that the average occupation numbers remain constant 

(N a (t)) = const = (N a (t f )) t > t f (C.l) 
and that the covariances are 
E ab (t) = (N a (t)N b (t)) - (N a (t)) (N b (t)) 

= (l - e" 2 ^-'/)^) (N a ) (S ab N - (N b )) + e-^-Wz^tf) t > t f . 

(C.2) 

Expanding this to largest order in N, yields 

Z ab {t)~2r(t-t f )(5 ab (N a )-^(N a )(N b ) S J+E ab (tf) t>t f . (C.3) 
Note that the expansion in N converges only for finite times. 
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